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Abstract 

In this paper the authors present a toolkit of C language functions 
'that can be linked with SIMSCRIPT programs to provide the data 
communication primitives necessary for distributed simulation. The 
authors' test case is discussed and some timing data are presented. 
Additionally some metrics, developed to determine the applicability 
of the server model decomposition for particular simulations, are dis- 
cussed. 


1 Why Distribute Simulations 

Computer simulations are often computationally intensive tasks requiring 
long runs in order to obtain useful results. The runtime requirements of 
a simulation model can be a problem both during model development and 
validation and while performing production runs of the simulation. 

The development of computer models to simulate a real world objects 
is a well understood problem and number of tools exist to aid the model 
developer [1]. Often times, the initial runs of a simulation model provide 
more questions than answers and the focus of study is changed. This re- 
sults in an evolutionary process for model development, with refinements 
directed at different attributes as the object or its environment becomes 
better understood. Often the complexity of the model also increases during 
this process. 

As the model is evolving, many runs may be needed to better understand 
the object and to verify the correctness of the simulation. The runtime 
requirements of complex models can greatly increase the time heeded to 
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Onro a model It ;is ** vt it vcmI In the point that production runs are being 
the runtime recpiiremen Is again minr min play. Often I, Ik* output 
finin e,i< li miii may only consist <>! a single data point. for a graph. In t.his 
rasr multiple rims of I In* same simulat ion wit h different inputs are needed. 

I his ran form I Ik* investigator to reduc e t Ik* number of dal a points collected 
in order to reduce t lie* time needed to genera i.e a graph. resulting in incorrect 
conclusions about Uie simulated object. 

Complementary to the problem of long, computationally intensive, run- 
times is the fart that, many times other computers arc* sitting idle and can 
provide basically free processor cycles to the simulation. In an effort to 
utilize sonic* ol these live cycle's, much research has gone into developing 
algorithms for performing a single simulation on a number of lose) y coupled, 
cooperating processors. 

I he idea of distributing a simulation model among cooperating proces- 
sors involves difficult problems. Princ ipal among these arc* the* identification 
of an effective decomposit ion ol the* simulation mode*!, and the maintenance 
of processor synchronization lo insure that tin* program is being executed in 
the correct order. I he* use* of very tightly coupled functions and dependence 
on shared data, common in simulation programs, makes these problems are 
very anile to distributed simulation. 

Significant research lias gone into these two problems and the results arc* 
promising depending on the model being simulated, ft is not our intention 
to address these problems in I his paper, but rather to select, an effective de- 
composition and synchronization scheme that will be* used lo demonstrate 
distributed simulation using our communication toolkit and standard simu- 
lation and operating system tools. A comprehensive treatment of the pro- 
cessor synchronization and model decomposition problems c an bo found in 
[ 2 , 3/1 J. The problem of processor synchronization is more easily solved in 
very tightly coupled processors that support very high speed communication 
[•>>(* 

2 Model Decomposition 

The model decomposition most easily supported by the tools here is to 
distribute some special l vpc*s of model components, here* called servers and 
receivers, on different machines. The term server is borrowed from object 
oriented design: a component is a server submodel if it can be* represented 
;us only sending to other model components. 
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This decomposition r;m he thought nf as a co||e< linn of data servers and 
and receivers with no cycles. Willi no cycles synchronization heroines easier 
and I he jnnhlcius wil.h deadlork such as that described in [7] is avoided. This 
is an easy and usually useful decompos'd inn lor complex, lightly coupled 
models, because the extensive data interaction among the model's parts 
are not interfered with. Ollier, potential more effective decompositions an* 
outside the scope of this paper. 

High performance scienlilic worksta.l ions sharing a LAN ;uc becoming 
common. Since shared memory is not available and message passing among 
workstations in the network is slow, a decom position of the simulation task 
is only likely to to be effective if message's are in frequently passed among 
workstations. 'This toolkit has been developed with these constraints in 
mind. 

The tools support a ’’warehouse" approach. Information to be sent to 
receivers is “bate heel" and sen! periodically as a single 1 large 1 message rather 
that as several smaller messages. In addition, the 1 receiver workstation main- 
tains inventories of dal a from servers. and based on ;i u I ic j paled consuiuptiou 
“orders'' additional data periodically so that new data should arrive before 
current supplies are exhausted. 

If data provided by (lie 1 servers requires significant computation and data 
sent to the receivers also requires computing time, than significant par* 
a Holism can result since the required compulation is off loaded from the 
receiver workstation . No possibility of deadlock exists in I his approach and 
syne hrouizaiiou is particularly simple. 


3 Distributed Simulation with Standard Tools 

In this paper we 1 present a Inolkil of functions I hat allows distributed sim- 
ulation to be carried out in a. loosely coupled, general purpose, workstation 
environment without the use of special purpose operations systems, pro- 
gramming languages, or hardware. 

The environment for which this soft ware was developed contains a collec- 
tion of Sun workstation computers connected via an ethernet LAN. These 
arc very loosely coupled UNIX workstations with no shared memory and 
only communicate via a shared bus (the ethernet LAN). SIMSCRIPT was 
selected as the simulation language because of it wide use in the simulation 
community. All processors cooperating in Hie simulation run programs writ- 
ten in SIMSC1UPT and call external functions for processor communication. 



The processor roinmimic;ili<m functions ;nc provided vi;i the UNIX Inter- 
process Coininti nicalion (I PC) functions [xj. 'These* functions are standard 
with the MSI) UNIX operating system and allow com mu niration among 
processes both within tin* same* computer and those residing on separate 
computers. Because the I PC functions are designed to provide communica- 
tion among a large number of varying processor types, a significant amount 
of overhead is inherent with data communication. This could he reduced 
by writing replacement functions that only provide for the needs of this 
simulation, however a design goal was to use as little custom software as 
possible. 


4 The Toolkit 

The toolkit consists of a collection of functions, written in the C language 
and linkable with SIMSCUIPT programs. The basic functions provided by 
the toolkit are interprocess data communications and sufficient processor 
synchronization to allow a simulation to l>r broken into a collection of data 
servers and receivers. 

To use these tools, one must lirsl determine whal in Ilnur model can lx* 
t hought of as a source or generator of precompulahle objects. In order for an 
object to bo precomputed, no information about current simulation time or 
access to local variables can be required. The most obvious precompn table 
object is random numbers, however, more complex objects may be precom- 
puted based on the simulation model at hand. Once the data sources have 
been identified, the simulation is written as usual, except that the identified 
source objects are written as a separate SIMSCUIPT program. This results 
in the simulation being implemented as a data generator program and a 
simulation program. Two C language fund inns must be called by both the 
simulation program and the generator program to install the communica- 
tions interrupt handler and to identify each of the generators participating 
in the simulation. Kach of the SIMSCUIPT programs must also contain a 
function that the C routine will call to transfer generated data to and from 
SIMSCUIPT variables. Kach of these functions are described below. 

C Functions 

• inst_int( host, mode ) 

- char + hosl The name of the host running the receiver program 
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- char + mnde Must 1 11 ;i I “server" nr “rcrcivrr" for I ho server and 
receiver programs respectively. 

This is a (’ routine that is called by both I hr receiver and server 
programs, (’ailed only mice, ami before any Hie link server function 
described below. Ihis luiiclioii opens a socket lor reading, installs the 
communical ions interrupt handler. and i nil ializes I he variables used 
to maintain the server information. 

• link j?e rvor (server, host, mode) - 

- char ^service - The name of the service being identified 

- char *hosl - The name of the host where the server resides 

- char + mnde - Musi, equal “server** or “receiver*' for the server and 
receiver programs respectively. 

I his (’ ron line is tailed by both the receiver and server processes 
to identify the services that are being; used in this simulation. 'The 
function creates a record of the identified server's information, opens 
a. sending; socket for the server, initializes the list of messages to that 
server as null, and adds the server to the list of participating servers. 

• request (service, command ) 

- char + service The name of the service being requested 

- i ill, command A command to be sent lo the server. This com- 
mand integer is not examined by the toolkit: it is completely 
deli liable by the model and server developers. 

This C routine is called by the simulation module to request more 
data items from a server. After the request is sent, control is return 
to the simulation .software. When the requested data are received, the 
simulation code will he interrupted and the toolkit will make a call to 
user provided accept.dala routine, described below. 

SIMSCRIPT Functions 

• accept.datn given service, data, length 

- service - texl variable coni -aiding the name of the service sup- 
plying the data. This field is used In route' data to the correct 
inventory. 



- (lata- memory lor ol> jerl s created . This memory spare will be for- 
matted by I lie server In hold the data in the correct SIMSCRIPT 
format. 

-- It'll ^1. li This is I ho buigt It in bytes of the data area. 

The ac.cept.data function is called by the C routine that performs 
the socket reads when new data arrives. Moralise the arrival of data 
causes and interrupt If) he serviced and this funclion is called during 
that interrupt, the code may be executed at any time. This will have 
an impart on I he simulator's use of pointers or indit es to the inventory 
of data. 

• filLrequest given service, data, yielding length 

- service - Text variable* containing I lie name of the service being 
requested. 

- data - memory for objects being created. This is unformatted 
memory and ran he interpreted and Idled according lo the objects 
being requested. 

-- length Integer variulde ret tiniing t he length in bytes of the data 
to bo supplied. 

This SIMSCRIPT fund ion is I In* server complement to the accept 
data funclion. When a request for objects is received by the commu- 
nications handler, this function is called to fill the request. As before, 

because I he con mirations an* interrupt driven, this function can be 

called at any lime. 

After the receiver code has been moved to a separate program, addi- 
tional SIMSCIUPT rode will have to lie added to the receiver program to 
manage the remotely generated data. This additional code keeps track of 
the available inventories of remotely generated data, placing requests for 
additional data when the local inventory falls below some threshold. Mow 
this threshold is calculated is discussed in a later section. 

The receiver program may itself be a data source, for example generating 
simulation data that are sent to additional programs that provide statisical 
analysis and summary reports or to other servers for graphical display. 
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5 Timing Data and Decomposition Considerations 

Message* passing overhead must lx* considered when designing any typo of 
dislri but ed processing. To decide if any speed ups ran lx* expected for a 
simulat ion using I lx* server model deenmposit ion . some' analysis for message 
passing l imns verse local computational costs should hr performed. The 
following definitions are used to perform this analysis. 

• CIO - Overhead induced by servicing a communications interrupt. 
This includes the lime required to transfer data from the commu- 
nications buffer to a SIM SCRIPT variable. 

• CSO - Overhead induct'd by actively sending a message to another 
server. 

• CTT - Time lor a command message to travel be! ween two hosts. 

• irr r - Time for a data message to travel hr! wren two hosts. 

• MS r - Minimum time required to supply objects. 

• OS - Order size. The number of items shipped in each order. 

• ECU - Expected consumption rate for generated items. 

• DCC - Distributed computation cosl . 

• LCC - Local computational cost for generating one ob ject 

• HOT - Remote generation time. The time required by the server to 
generate the values. This value is determined by OS and LCC. 

• THO l ime between orders. 

Assume that the generators have precomputed more of the objects than 
will be requested so that I lie lime that I lie generator will spend processing 
a request is 0. With this assumption we can define 
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Clearly the formulae' below must hold or il will always be faster to com 
pule the ob jects locally. 


os > MSI * i:ci? ( i) 

OS t L( '(' > CIO f ('SO (5) 

II) many eases, unless LCC (the* cost of computing ihe data, locally) is 
significant, OS will have t.o be large to make' this approach feasible. Practi- 
cally speaking, since for nmst. simulations the actual consumption rate can 
vary, OS * LCC should bo significantly larger than CIO + ('SO, 

In order to assist in determining the potential effectiveness of using t h is 
approach for distributed simulation, some timing data was collected for LCC. 
CIO, CSC), and (T. The* variable* KCICis model dependent and, with the 
other variables fixed, OS can be* determined. 

Timing data- for message j>assing among Sun workstations connecteel via 
ethernet networks and bridge's was collected. I he* lime's required for message 
passing are not significantly a Her ted by message* length as long as are less 
than the maximum packet Irngt li for tin* ethrrnM (ir»00 hytes). Messages 
were passed between processors that re*siele*el on the* same physical network 
and those* e>n separate*, bridged networks. As ran be* seen below, messages 
that had to go across bridge's (oe>k t wice* as long as those that stayed on a 
single network. All data was collected when I lie* network was lightly loaded. 


Packet Size: 100 bytes 


Number of Packets Single Net (seconds) Pridgeel Nets (seconds) 


100 
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Packet Si/e*: M)0 bytes 

Number of Packets 

Single* Ne*t (seconds) 

H ridged Nets (seconds) 

100 

I 

:i 

r»oo 

0 

1 s 

1,000 

M 

20 

.*»,()()() 

OS 

Vtf 

10,000 

1 IS 
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r, o.ooo 

r» i 
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Packet Size: 1000 bytes 


Number of Packets 

Single Net (seconds) 

K ridged Nets (seconds) 

100 

2 

:i 

r>oo 

x 

20 

1000 

u; 

:u 

5,000 

70 

171 

(0,000 

1 50 

:m 

50,000 

700 

1,721 


Analysis of the above data gives the following table of average times and 
throughput rates, limes are given in seconds/byle and I h rough put is given 
in byt.es/second . 


Packet Size 

Average ' 
lut ra-net 

Pi me (sec.) 
Inter* net 

'I’ll muglipul ( byl.es/sec ) 
lutra-net In ter- net 

100 

0.000087 

0.000 KM 

11/100 

5,100 

5(K) 

0. 000022 

0.0000*17 

'10,000 

2 i ,500 

1 ,000 

0.000015 

O.OOOCKM 

(;:!,«()() 

20,000 


To obtain values for the variables L('(\ CSO, and CIO, the UNIX prof 
command was used. This is a standard UNIX tool that profiles executable 
code and generates reports on number of times each function is called, time 
spent during each call, and total time spenl in the function during program 
execution. I* or more information on the prof command see [oj. 

6 Example 

As an example, consider I he generation of normally distributed random 
numbers. The machines used are Sun 3/00 workstations with 8 megabytes 
oT memory. !,(•(’ was found to be 0.1 ms; the ( SO and OK) were both 0.025 
ms. 'The table below shows values for OS with corresponding KCK values. 
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EC It /mi mite 

OS 

Orders 

rc;t 

THO 

MST 

10,000 

50 

200 

0.005 

o.50 

0.002 


100 

100 

0.010 

0.00 

0.015 


500 

20 

0.050 

:i.oo 

0.001 


1 ,000 

10 

0.100 

0.00 

0.121 


5,000 

2 

0.500 

50.00 

0.001 

100,000 

50 

2,000 

0.005 

o.o;i 

0.002 


100 

1 ,000 

0.010 

0.00 

0.015 


500 

200 

0.050 

0.50 

0.001 


1 ,000 

100 

0.100 

0.00 

0.121 


5,000 

20 

0.500 

5.00 

0.001 

500,000 

50 

10,000 

0.005 

0.000 

0.002 


100 

5.000 

0.010 

0.012 

0.015 

- 

500 

1 ,000 

0.050 

0.000 

0JH5I 


1 ,000 

500 

0.100 

0.120 

0.121 


5,000 

100 

0.500 

0.(500 

0.001 

1,000,000 

50 

20,000 

0.005 

0.005 

0.002 


100 

10,000 

0.010 

0.000 

0.015 


500 

2.000 

0.050 

0.050 

0.00 1 


1 ,000 

1.000 

0.100 

0.000 

0.121 


5,000 

200 

0.500 

o.:kio 

0.001 


Two points ran be made from this table. I'irst, when the* order size 
becomes large, the communication l inn* for transferring l.lie numbers from 
the server to the simulator heroines larger than t lie time required to compute 
the values locally. As long; as the lime needed to consume the numbers is 
greater than the MST, a speedup is possible with the server decomposition. 

Secondly, when the KCR becomes very large, the remote server cannot 
kcej) up with the KCK, the receiver will he forced to wait for the server to 
generate numbers. If the time spent waiting is significantly less than what 
is required to compute I lie values locally, then the server decomposition still 
provides speedup. 

7 Conclusion 

The toolkit that we have developed can he use to develop distributed simula- 
tion applications without having to invest in new environments or training. 
SIMSCRIPT and the UNIX operating system are widely available, allow- 
ing easy access to these tools. The toolkit is composed of 050 lines of C 
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code and requires about approximately 50 lines of additional SIMSCRIPT 
code per server/receiver pair to be added to the simulation model. The user 
of these tools need only be concerned with three C function calls and two 
SIMSCRIPT routines, so the complexity of the simulation program is not 
significantly affected. 

Use of these tools requires decomposing a model into components in 
which information flow is unidirectional such as traffic arrival generators, 
statistical analysis procedures, or graphical displays. 
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